† Corresponding author. E-mail:
Identifying influential nodes in complex networks is essential for network robust and stability, such as viral marketing and information control. Various methods have been proposed to define the influence of nodes. In this paper, we comprehensively consider the global position and local structure to identify influential nodes. The number of iterations in the process of k-shell decomposition is taken into consideration, and the improved k-shell decomposition is then put forward. The improved k-shell decomposition and degree of target node are taken as the benchmark centrality, in addition, as is well known, the effect between node pairs is inversely proportional to the shortest path length between two nodes, and then we also consider the effect of neighbors on target node. To evaluate the performance of the proposed method, susceptible-infected (SI) model is adopted to simulate the spreading process in four real networks, and the experimental results show that the proposed method has obvious advantages over classical centrality measures in identifying influential nodes.
Complex networks have received increasing attention in recent years. A number of systems in real life can be represented abstractly by complex networks, such as social network,[1,2] traffic system,[3,4] power grid,[5,6] and biological network.[7,8] A small number of special nodes, called influential nodes, will have a significant influence on the structure and function of the network. Structural damage or functional failure of influential nodes will cause the entire network to crash. Therefore, research on identifying influential nodes is of great significance for improving the robustness of systems and designing an efficient system structure. On the one hand, we can control and protect influential nodes to stabilize and secure the complex systems. For example, we can control some famous users in social network to prevent rumors from spreading, finding and protecting important airports in aviation network can avoid the large area delay of routes, protecting key substations in power grid can prevent large-scale power outages. On the other hand, we can also attack influential nodes to destroy the structure and function of a network. For instance, the Stuxnet virus, developed by the United States and Israel, damaged critical facilities of Iran’s nuclear power network and successfully attacked and controlled Iran’s nuclear power network.
Research of influential nodes identification has two lines: (i) influence ranking of individual node and (ii) identifying a group of nodes to maximize the influence. In the present paper, we focus on the former. Up to now, various centrality measures have been proposed to address this issue, such as degree centrality (DC),[9] closeness centrality (CC),[10] betweenness centrality (BC),[11] eigenvector centrality,[12] local centrality (LC),[13] k-shell decomposition (Ks),[14] etc. Degree centrality is a well-known local metric that determines the influence of node by the number of its nearest neighbors, but it only considers limited local information and ignores the effect of global information. Closeness centrality and betweenness centrality are classical measures belonging to global metric. They have high time complexity because they need to obtain the topological characteristics of the entire network, and thus they cannot be applied to large scale networks. To make a tradeoff between local metric and global metric, Chen et al.[13] proposed a local centrality to identify influential nodes in which the target node’s neighbors within 4-hops are taken into consideration. Kitsak et al.[14] found that the influence of node is determined by node’s location in the network and proposed k-shell decomposition centrality. The k-shell decomposition strips the nodes in the outer layer, and the nodes in inner layer are influential nodes in the network. However, this method is a coarse-grained sorting algorithm and cannot further distinguish the influences of nodes. Then many methods were put forward to modify the k-shell decomposition. Zeng and Zhang[15] considered the degree information of removing nodes and put forward the mixed degree decomposition method. Lin et al.[16] considered neighbors’ k-core value and proposed an improved k-shell decomposition method. In the field of search engine, a number of approaches have been put forward to identify influential nodes such as PageRank,[17] LeaderRank,[18] and Hits,[19] in these iterative refinement methods the influence of neighbors is taken into account. Note that centrality measure cannot work well from a single view, then some researchers indicated that combining several centrality measures to identify influential nodes can obtain more accuracy ranking results. The technique for order preference by similarity to an ideal object (TOPSIS) method was used to identify influential nodes in two actual networks,[20] Yang et al.[21] used Vlsekriterijumska Optimizacija I Kompromisno Resenje (VIKOR) method to combine several classical centrality measures, Dempster–Shafer evidence theory was used to comprehensively combine degree centrality, betweenness centrality, and closeness centrality.[22] The global position and local degree variation were both considered to identify influential nodes, and entropy method was used to assign weights to the two aspects.[23] Wang et al.[24] combined the improved k-shell decomposition and the neighborhood’s effect to give an influential nodes identifying method with low time complexity and high accuracy. Zhao et al.[25] considered both local centrality and clustering coefficient, and provided a new method to identify influential nodes. In a word, it is still an open issue to establish a model for identifying the influential nodes.
In this paper, a novel method of identifying influential nodes, called GLI, is proposed based on global and local information. The greatest issue with k-shell decomposition is that it produces many nodes with the same ranking, we consider the number of iterations in the process of k-shell decomposition and put forward the improved k-shell decomposition. The improved k-shell decomposition and degree of target node are chosen as the benchmark centrality, and it is considered in the GLI that the effect between nodes will decrease with the increase of the shortest path length between nodes, in addition, neighbors within 3-hops are considered. To evaluate the performance of GLI, the susceptible-infected (SI) model[26] is utilized to simulate the spreading process in four real networks, and other classical centrality measures (DC, CC, BC, and Ks) are employed to compare the GLI. The superiority of GLI is demonstrated according to four experiments.
The rest of this paper is organized as follows. The related work is introduced in Section
An undirected and unweighted graph is represented as G = ( V,E,A ), V = { v1,v2,…,vn } denotes the set of nodes and E = { e1, e2, …,em } refers to the set of edges, A = { aij } is the adjacency matrix of G: if node i and node j are connected, aij = 1; otherwise, aij = 0.
Degree centrality[9] is the simplest and most famous centrality for identifying influential nodes, but it only considers the number of its nearest neighbors, which is defined as
Closeness centrality[10] of node i is defined as the reciprocal of the sum of the shortest distances from node i to others in the network, it is expressed as
Betweenness centrality[11] is a classical global metric, which calculates the number of the shortest paths passing through the node. It is defined as
The k-shell decomposition[14] assigns an index (ks) to each node. Assume that there is no isolated node in the network, and take Fig.
In Fig.
The improved k-shell decomposition in Fig.
Various centrality measures have been proposed to determine an actor to be ‘influential’ from their perspectives, which will lead to some limitations for each measure. However, the accuracy and effectiveness of influential nodes identification can be improved by considering the influence of nodes from multiple perspectives. The global position and degree are classical centrality measures which consider global information and local information respectively, and we propose to integrate these two centrality measures. Besides, the main limitation of k-shell decomposition is that it leads to a few nodes with same ranking, then we propose the improved k-shell decomposition based on the number of iterations. What is more, we also consider the effect of neighbors, in fact, the effect between node pairs is inversely proportional to the shortest path length between two nodes. Therefore, we propose a novel measure based on global and local information (denoted as GLI). The GLI considers that the effect between nodes will decrease with the increase of the shortest path length between nodes, the improved k-shell decomposition and degree are chosen as the benchmark centrality, and we consider the neighbors within 3-hops. GLI is defined as
The GLI considers the global position and local structure, and the neighbors within 3-hops are also taken into consideration. We provide a new method to identify influential nodes from our perspective. Compared with classical centrality measures, GLI can give very accuracy and effective ranking results.
To verify the efficiency of GLI, we conduct the experiments with the following four real networks. They are (i) Karate club network,[27] which is a social network describing the relationship among 34 members of a Karate Club in the 1970s; (ii) Jazz musicians network,[28] which is a collaboration network among 198 Jazz musicians, with each edge representing two musicians play together in a band; (iii) USAir97 network, which is an aviation network with each node denoting an airport, and each edge representing a route between two airports; (iv) Email network,[29] which is a social network with each node denoting a user, and each edge representing e-mail interchange between members.
We employ susceptible-infected (SI) model[26] to simulate the spreading process and evaluate the performance of GLI. Each node in the SI model is either in susceptible state or in infected state. Nodes in susceptible state will be infected by nodes in infected state with a certain probability, and once susceptible nodes are infected, they cannot be recovered. Suppose that only one node (node i) in the network is infected at t = 0, namely, node i is the source node of spreading process, and all the other nodes are in susceptible state. Then the source node infects its neighbors with a certain probability, and at the same time, the disease or information begins to spread through the network. The number of infected nodes will be nit after t (t = 1,2,…) time step, and we define the spreading ability of node i as Ii (t) = nit/n, besides, we set the maximum time step to be t = 50. To eliminate random error, we run 1000 independent simulations.
The Kendall’s tau coefficient[30] is adopted to measure the correlation coefficient between centrality measures and the actual ranking simulated by SI model. Suppose that there are two ranking lists X and Y, (x1,y1 ), (x2, y2 ), …, (xn, yn ) is a set of joint ranks from the two ranking lists. A pair of distinct nodes i and j is concordant if ((xi > xj ) and (yi > yj )) or ((xi > xj ) and (yi < yj )). If ((xi > xj ) and (yi < yj )) or ((xi < xj ) and (xi > xj )), then the node pair is said to be discordant. And the Kendall’s tau coefficient τ is defined as
We rank the influences of nodes in four networks with GLI, and other classical centrality measures including DC, CC, BC, and Ks are applied to the same networks for comparison. We choose the top-10 lists generated by each method. The results to be ranked are presented in Table
According to Table
When ranking the influence of nodes with different methods, we find that some nodes are given the same ranking, which makes it impossible to distinguish them. In addition, the indistinguishable degrees of nodes generated by different methods are not the same. Therefore, we focus on the frequencies of nodes with the same ranking of each method in four networks, the higher the frequency, the more the nodes in the same ranking are and the worse the distinguishing ability. Figure
To further analyze the distinguishing ability of different methods, we introduce the monotonicity index M(R) and it is defined as[31]
From Table
In this part, we compare the average spreading ability of top-10 nodes ranked by each method in four networks. The average spreading ability of top-10 nodes is defined as
According to Fig.
In this part, we pay attention to the correlation between methods and the actual ranking list. The actual ranking list γ simulated by SI model is taken as an actual spreading situation, and the other ranking lists generated by each method are compared with γ. Figure
In this paper, various methods are proposed to identify influential nodes in complex networks from different perspectives, each method has its own viewpoint to define an actor to be ‘influential’. We comprehensively consider global and local information to identify influential nodes, and the GLI method is then put forward. Firstly, we improve k-shell decomposition through considering the number of iterations in the process of k-shell decomposition, and the improved k-shell decomposition is considered as a global structure. Then, degree centrality is considered as a local structure, and the combination of global and local structure is taken as the benchmark centrality. Finally, we consider the effect of neighbors within 3-hops on the target node, and GLI method is established. The GLI comprehensively combines global position and local degree, thus providing a more accurate and effective approach to identifying the influential nodes. We utilize the SI model to simulate the spreading process in network, and four experiments are conducted to demonstrate the superiority of GLI. Experimental results show that the GLI method has obvious advantages over the classical centrality measures.
[1] | |
[2] | |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] | |
[12] | |
[13] | |
[14] | |
[15] | |
[16] | |
[17] | |
[18] | |
[19] | |
[20] | |
[21] | |
[22] | |
[23] | |
[24] | |
[25] | |
[26] | |
[27] | |
[28] | |
[29] | |
[30] | |
[31] |